9 - Deep Learning - Feedforward Networks Part 4 [ID:13521]
50 von 206 angezeigt

Welcome everybody to our next video on deep learning.

So today we want to talk about again feed forward networks, fourth part, and the main

focus today will be layer abstraction.

Of course we talked about those neurons and individual nodes, but this grows really complex

for larger networks.

So we want to introduce this layer concept also in our computation of the gradients.

So yeah, this is really useful because we can then talk directly about gradients on

entire layers and don't need to go towards all of the different nodes.

So how do we express this?

And let's recall what our single neuron is doing.

The single neuron is computing essentially an inner product of its weight.

And by the way, we are expanding now, we are skipping over this bias notation.

So we are expanding this vector by one additional element, the X vector by one additional element

that is one, and this allows us to describe the bias also in the inner product that is

shown on this slide here.

So let's magnify this a bit that you can read the formulas better.

And this is really nice because then you can see that the output prediction Y hat is just

an inner product.

Now let's think about the case that we have M neurons, which means that we get some Y

hat of M and all of them are inner products.

So if you bring this into a vector notation, the vector space representation, summing up

the input from all sensors, that doesn't does not show any pictures, but it shows you can

see that the vector Y hat is nothing else than a matrix multiplication of X with this

matrix W. And you see that a fully connected layer is nothing else than a matrix multiplication.

Of course, we are building on all these great abstractions that people have invented over

the millennia, such as matrix multiplications.

So we can essentially represent arbitrary connections and topologies using this in the

fully connected layer.

And then we also apply a point-wise non-linearity such that we really get this non-linear effect

here.

Now the nice thing about the matrix notation is of course that we can describe now the

entire layer derivative using matrix calculus.

So our fully connected layer would then get the following configuration.

Let's consider three elements of the input.

Then they have for every neuron, let's say we have two neurons, then we get a weight

vector, we multiply the two, and then the forward pass, we simply have determined this

Y hat.

For this module, if we want to compute the gradients, then we need exactly two gradients

and it's the same gradients as we already mentioned.

We need the gradient with respect to the weights, that's going to be partial derivative with

respect to W, and the partial derivative with respect to X for the back propagation to pass

it on to the next module.

So how does this evolve?

Well, we have the layer that is Y hat equals to W X, so there's a matrix multiplication

and the forward pass, then the gradient with respect to the weights.

And now you can see that what we essentially need to do is we need a matrix derivative

here and the derivative of Y hat with respect to W is going to be simply X transpose.

So if we have the loss that comes in into our module, the update to our weight is going

to be this loss vector multiplied X transpose.

So we have some loss vector and X transpose, which essentially means that you have two

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:17:42 Min

Aufnahmedatum

2020-04-18

Hochgeladen am

2020-04-19 01:16:09

Sprache

en-US

Deep Learning - Feedforward Networks Part 4

This video explains backpropagation at the level of layer abstraction.

Video References:
Lex Fridman's Channel
 

References
[1] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, inc., 2000.
[2] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.
[3] F. Rosenblatt. “The perceptron: A probabilistic model for information storage and organization in the brain.” In: Psychological Review 65.6 (1958), pp. 386–408.
[4] WS. McCulloch and W. Pitts. “A logical calculus of the ideas immanent in nervous activity.” In: Bulletin of mathematical biophysics 5 (1943), pp. 99–115.
[5] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning representations by back-propagating errors.” In: Nature 323 (1986), pp. 533–536.
[6] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectifier Neural Networks”. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence Vol. 15. 2011, pp. 315–323.
[7] William H. Press, Saul A. Teukolsky, William T. Vetterling, et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3rd ed. New York, NY, USA: Cambridge University Press, 2007.

Further Reading:
A gentle Introduction to Deep Learning 

Tags

backpropagation artificial intelligence deep learning machine learning pattern recognition Feedforward Networks Gradient descent
Einbetten
Wordpress FAU Plugin
iFrame
Teilen